views:

341

answers:

6

I'm trying to do some tuning for Oracle on Linux boxes living on SAN based infrastructure. I'm looking specifically for tools that would allow us to profile IO per process (or per process tree would be even better). My questions are?

  • What are the tools that would be recommended for this kind of task?
  • What other useful metrics should I seek to measure on a SAN based infrastructure?
+1  A: 

Once you start to get this specialized, I've found that the easiest thing to do is to write some custom scripts that pull information from files under /proc.

If you're analysis for which you don't already have a tool that gives you the exact report you need, you're probably going to end up doing some scripting anyway, and most of the tools you'd use under Linux are just going to /proc to get their information anyway and then reformatting it for you.

If you're more into the databasey side of things, pulling info from /proc on a regular basis, adding timestamps, and recording it in a way that it can be imported into an RDBMS can be very useful. This can be particularly good if you put all of your server and process performance information into a single RDBMS, because then you can compare, arbitrary things such as the performance of the same application on different servers.

Keep in mind that if you go further with this, you my start adding information from different sources, such as IPMI monitoring of hosts, so don't do things that you'll have to undo once you're using more than /proc.

Curt Sampson
+1  A: 

I have used "iotop" with great results. It gets specific info per process with IO usage.

It works like "top"

http://guichaz.free.fr/iotop/

I am not sure though if it would be reasonable to use from a Linux box that has the SAN mounted or if you wanted a tool that could run within the SAN.

alfredodeza
A: 

You can use sysstat utilities which are a collection of performance monitoring tools for Linux.

From the website (perso.orange.fr/sebastien.godard/)

    * Can monitor a huge number of different metrics:

     1. Input / Output and transfer rate statistics (global, per device, per partition, per network filesystem and per Linux task / PID)
     2. CPU statistics (global, per CPU and per Linux task / PID), including support for virtualization architectures
     3. Memory and swap space utilization statistics
     4. Virtual memory, paging and fault statistics
     5. Per-task (per-PID) memory and page fault statistics
     6. Global CPU and page fault statistics for tasks and all their children
     7. Process creation activity
     8. Interrupt statistics (global, per CPU and per interrupt, including potential APIC interrupt sources)
     9. Extensive network statistics: network interface activity (number of packets and kB received and transmitted per second, etc.) including failures from network devices; network traffic statistics for IP, TCP, ICMP and UDP protocols based on SNMPv2 standards; support for IPv6-related protocols.
    10. NFS server and client activity
    11. Socket statistics
    12. Run queue and system load statistics
    13. Kernel internal tables utilization statistics
    14. System and per Linux task switching activity
    15. Swapping statistics
    16. TTY device activity
    17. Power management statistics
Cydork
A: 

I usually use atop to monitor the load on my systems. Some features require that you patch the kernel, but it gives precise information about I/O as well as other info.

Fred
A: 

What other useful metrics should I seek to measure on a SAN based infrastructure?

CPU load. It's Main metrics for oracle database.

vitaly.v.ch
A: 

Depending on how low-level you want to get, System Tap could be very useful for you. It is similar to DTrace on Solaris.