views:

263

answers:

7

I'm trying to find the best way to use 'top' as semi-permanent instrumentation in the development of a box running embedded Linux. (The instrumentation will be removed from the final-test and production releases.)

My first pass is to simply add this to init.d:

top -b -d 15 >/tmp/toploop.out &

This runs top in "batch" mode every 15 seconds. Let's assume that /tmp has plenty of space...

Questions:

  1. Is 15 seconds a good value to choose for general-purpose monitoring?
  2. Other than disk space, how seriously is this perturbing the state of the system?
  3. What other (perhaps better) tools could be used like this?

Thanks in advance for your answers!

+1  A: 

You might find that vmstat and iostat with a delay and no repeat counter is a better option.

Paul Tomblin
+1  A: 

I suspect 15 seconds would be more than adequate unless you actually want to watch what's happening in real time, but that doesn't appear to be the case here.

As far as load, on an idling PIII 900Mhz w/ 768MB of RAM running Ubuntu (not sure which version, but not more than a year old) I have top updating every 0.5 seconds and it's about 2% CPU utilization. At 15s updates, I'm seeing 0.1% CPU utilization.

depending upon what exactly you want, you could use the output of uptime, free, and ps to get most, if not all, of top's information.

theraccoonbear
+2  A: 

We use sysstat to monitor things like this.

Steve K
+2  A: 

Look at collectd. It's a very light weight system monitoring framework coded for performance.

David Schmitt
+1  A: 

If you are looking for overall load, uptime is probably sufficient. However, if you want specific information about processes, you are adventurous, and have the /proc filessystem enabled, you may want to write your own tools. The primary benefit in this environment is that you can focus on exactly what you want and minimize the load introduced to the system.

The proc file system gives your application read access to the kernel memory that keeps track of many of the interesting variables. Reading from /proc is one of the lightest ways to get this information. Additionally, you may be able to get more information than provided by top. I've done this in the past to get amount of time spent in user and system by this process. Additionally, you can use this to get information about the number of file descriptors open by the process. You might also use this to get detailed information about how the network system is working.

Much of this information is pre-processed by other applications which can be used if you get the information you need. However, it is rather straight-forward to read the raw information. Do a man proc for more information.

terson
+1. Yep, I'm quite familiar with /proc! I was hoping not to have to write something, but you are correct about the advantage of being able to focus on exactly what you're after. OK, where's my emacs window...
Kevin Little
+1  A: 

Pity you haven't said what you are monitoring for.

  1. You should decide whether 15 seconds is ok or not. Feel free to drop it way lower if you wish (and have a fast HDD)
  2. No worries unless you are running a soft real-time system
  3. Have a look at tools suggested in other answers. I'll add another sugestion: "iotop", for answering a "who is thrashing the HDD" questions.
ADEpt
+1  A: 

At work for system monitoring during stress tests we use a tool called nmon.

What I love about nmon is it has the ability to export to XLS and generate beautiful graphs for you.

It generates statistics for:

  • Memory Usage
  • CPU Usage
  • Network Usage
  • Disk I/O

Good luck :)

Nick Stinemates