views:

458

answers:

2

I have a queuing mechanism in C on Unix. It accepts XML transactions. Some transactions contain records to be stored. Other transactions request those transactions. The transactions get stored in a file, which is a home-grown queue. First in, first out, very simple. Header area at start of file, keeps track of next position to read from, and next position to write to. We use file locking, but not semaphores as retrieval is polled from remote systems. And there's only one program that accesses the queues. It's in C. Been working fine for years.

Now we have to expand the system. The transactions will contain an extra XML tag. We have to selectively retrieve based on the values of that tag. We are going from a simple queue to a priority queue. There can be many different values in the tag. Say AX, BX, CX, FL and TS. Transactions get added to the queue in order received. We need to be able to retrieve them either in order they were received, or retrieve the next transaction where the tag is FL. Or TS. Or (CS or FL). Or not AX.

How best to do this?

Simple and fast are what we need. Several options come to mind:

  1. Use something like Berkely DB to turn the queue into a database of sorts.
  2. Tap into a PostgreSQL database, create a table that can be used as a priority queue.
  3. Find a C library that will do what we want.
  4. Write our own disk-based priority queue.

We have some constraints. Time is ticking away and this needs to be done in a few weeks. C for fast insertion into the system. Maybe Python if we can tapdance fast enough to convert all the other business logic in the program that accesses the queue. Prefer not to use PostgreSQL as we have no control over the database system and the DBA has nasty habits over what he considers "his" and we have no reliability of uptime even though this is a critical system. Politics, huh!! DBA has also said that using a PostgreSQL table is not an efficient way of doing it. We prefer something that is localized so we can control it. Got to be lightning fast to handle a lot of transactions per minute.

I'm open to any suggestions, even far-out ones. The more suggestions the better.

A: 

It sounds like all you really need is;

  1. A way to iterate over records while checking for a tag.
  2. A way to mark records as NULL, so records selected out of order are skipped via regular processing.

I would suggest the quickest change would be to have each record contain a small header in the file that includes this info (length of record, isValid, tag-info etc). You then start at the first record as normal and iterate over all the records till you encounter one with a tag, and when you do you mark this record invalid so regular processing will ignore it.

Long-term you may want to consider something like sqllite which is freely available, works with regular files, and can be compiled into your executable. For most types of record searching it's likely to be faster than something you can quickly roll yourself, and a heck lot more flexible :)

So I would go with the quick change now and consider reworking your data format in the near future, be it with sqllite or something else. It sounds like the requirements for your queue are becoming more complex so now would be the time to consider something that can be further expanded in the future.

Andrew Grant
I'll edit the question to explain the tag a bit better.Tell the truth, we do use file locking just in case.
codebunny
sqllite, yeah, good thought. Bound to be faster than rolling it ourselves.
codebunny
mark records as null, yes. This might actually be the fastest way to do it, Minimal code changes. Nice.
codebunny
I agree it should be a WTF, but I had to change it. I've been thinking and I believe I'll go with your suggestion. Thanks. Long-term, we'll do something sensible (sure, we'll be given the time) but this will get me going now.
codebunny
A: 

First thing that comes to my mind is to take some memory based priority queue implementation (should be plenty), and lay it on top of your own memory allocation routines which use a mmap'd file for their memory pool.

Jay Kominek