views:

128

answers:

1

I have an embedded Linux system running on an Atmel AT91SAM9260EK board on which I have two processes running at real-time priority. A manager process periodically "pings" a worker process using POSIX message queues to check the health of the worker process. Usually the round-trip ping takes about 1ms, but very occasionally it takes much longer - about 800ms. There are no other processes that run at a higher priority.

It appears the stall may be related to logging (syslog). If I stop logging the problem seems to go away. However it makes no difference if the log file is on JFFS2 or NFS. No other processes are writing to the "disk" - just syslog.

What tools are available to me to help me track down why these stalls are occurring? I am aware of latencytop and will be using that. Are there some other tools that may be more useful?

Some details:

  • Kernel version: 2.6.32.8
  • libc (syslog functions): uClibc 0.9.30.1
  • syslog: busybox 1.15.2
  • No swap space configured [added in edit]
  • root filesystem is on tmpfs (loaded from initramfs) [added in edit]
+2  A: 

The problem is (as you said) syslogd. While your process is running at a RT priority, syslogd is not. Additionally, syslogd does not lock its heap and can (and will) be paged out by the kernel, especially with very few 'customers'.

What you could try is:

  • Start another thread to manage a priority queue, have that thread talk to syslog. Logging would then just be acquiring a lock and inserting something into a list. Given only two subscribers, you should not spend a lot of time acquiring the mutex.

  • Not using syslog, implement your own logging (basically the first suggestion, minus talking to syslog).

I had a similar problem and my first attempt to fix it was to modify syslogd itself to lock its heap. That was a disaster. I then tried rsyslogd, which improved some but I still got random latency peaks. I ended up just implementing my own logging using a priority queue to help ensure that more critical messages were actually written first.

Note, if you are not using swap (at all), the shortest path to fixing this is probably implementing your own logging.

Tim Post
Not using swap. Furthermore, the rootfs is a tmpfs (initramfs) so the program text (busybox) would be paged from ram.
camh
@camh - I hate not making more suggestions to solve your problem at hand, but in the time it takes to investigate this, you could have implemented your own logging.
Tim Post
@Tim Post - I dont think the problem is with syslog itself, but somewhere in the kernel. Busybox syslog is pretty simple, and reimplementing that wont vary too much, since I need the ability to collect logs from multiple applications into a single rotating log (based on size). Besides, I have to know why this is stalling. I can't have it come back in the field. Thanks for your answer.
camh