You may be focusing on what is not the hardest part of your design here.
If the queue is FIFO without any prioritization then your accessors are going to be push_back() and pop_front() - very fast even if you don't go to the trouble of using compare-and-swap (CAS) semantics but stick with a simple mutex/critical section. If you need the ability to prioritize traffic then things get harder. If you do use CAS locking then (on Windows anyway) you cannot improve on boost::thread's shared_mutex without spending way too much time doing this part of your coding. Not sure about the non-Windows implementations.
The more complex part of this problem is typically signalling idle worker threads to pick up new work. You can't have them looping until queue.front() is non-empty so you need a way to ensure the correct number of idle threads get kicked to pick up queued items. When a worker thread goes idle it can check for new work and execute if so, if not then the queue state needs to be set to idle so that the next push_back results in a "wake up" kick to restart the worker thread pool. This area has to be 100% robust to all non-fatal exceptions or your process will go dark.
Are you managing your own threads or using a built-in thread pool? Are you planning to have a dynamically-sized thread pool or just spawn N threads (configurable presumably) and have them run until process exit?
Definitely have the worker threads do the logging of work item process. Understanding who owns a workitem at any part of its lifecycle is vital. Stop/start work, and a workitem summary and timing is going to be useful. if logging is slow then push this off to a separate thread via a fire-and-forget queue but then you have to look out for latency there making your log less useful. If you do need the ability to externally manipulate in-progress workitems then a separate structure from your queue of pending work - in-progress work items indexed by thread and showing current status/start time, with separate locking, sounds like a good idea. This structure is going to be O(thread count) so smaller than the "pending" queue so scanning it is not likely to be a bottleneck provided long-running resultant ops are done outside the structure lock.
Regarding performance - what are your worker threads going to be doing? if work items are going to be long-running, do a lot of I/O or other expensive operations, then the queue interaction is not your performance bottleneck anyway so over-optimizing that area is relatively unproductive. Consider perf of the entire system in your design, not just one small area.
This is just for starters. Good luck, this is not an easy system to design robustly.
[EDIT] based on workitem description.
Parsing should be quick (although may involve costly source data retrieval - hard to say?), DB access less so. Sounds like tuning the DB may be your biggest bang per buck perf-wise. If you don't have control of this then you just have to mitigate slow DB in your design as much as possible. If you have the option to do async DB access then the worker thread could just do enough work to kick off the DB call and then complete the work on a callback, allowing other work to get kicked off on the worker thread. Without async DB access, reliable request timeout will be hard to implement without some alternative method of indirection where your main worker thread does not wait for the DB calls to complete inline. You need to decouple your main worker threads from reliance on the DB unless you can trust the DB to return or error out in a timely way. Maybe some configurable or workitem-specific timeout on the DB request? Often the DB API libraries allow this.
Your timeout monitor would need to stay aware of the workitem state. Possibly some virtual Cancel() method on your workitem, to ensure flexibility in cleanup of timed-out items.