It is hard to say what the main problem is in your case, but it is most certainly not something that can be corrected with a call to sched_yield()
or pthread_yield()
. The only well-defined use for yielding, in Linux, is to allow a different ready thread to preempt the currently CPU-bound running thread at the same priority on the same CPU under SCHED_FIFO scheduling policy. Which is a poor design decision in almost all cases.
If you're serious about your goal of "attempting to be real-time" in Linux, then first of all, you should be using a real-time sched_setscheduler
setting (SCHED_FIFO or SCHED_RR, FIFO preferred).
Second, get the full preemption patch for Linux (from kernel.org if your distro does not supply one. It will also give you the ability to reschedule device driver threads and to execute your thread higher than, say, hard disk or ethernet driver threads.
Third, see RTWiki and other resources for more hints on how to design and set up a real-time application.
This should be enough to get you under 10 *micro*seconds response time, regardless of system load on any decent desktop system. I have an embedded system where I only squeeze out 60 us response idle and 150 us under heavy disk/system load, but it's still orders of magnitude faster than what you're describing.