networked day to day technical issues


Visualize sar reports with awk and gnuplot

On several systems where only sar (part of sysstat) is collecting and storing performance data i needed to troubleshoot performance issues which occurred several hours earlier . Sar is a great tool but  it is annoying that it doesn't have any option to output at the same time , on the same page, output from different reports (like cpu usage, memory usage and disk usage). If you try to request those three at the same time, it will output each report on it's on page and from there it's hard to visualize how each performance indicator evolved at a specific point in time. A solution would have been to load the data in a spreadsheet application and use vlookup function to group the data but this is time consuming and with my spreadsheet skills i don't think it can be automated.

I used awk and order to create a report from sar output, choosing the fields i considered useful in 95% of the times. Because my display resolution width is 900 i managed too squeeze in a lot of fields. In order to get a report for the date of 18th from 10 AM to 6 PM i use:

(for i in -u -r -q "-I SUM" -c -w -W -b; do LC_TIME="en_US.UTF-8" sar -s 10:00:00 -e 18:00:01 -f /var/log/sysstat/sa18 $i| tail -n +4 | head -n -1; echo "==========="; sleep 1; done) | awk -f report.awk

The code for report.awk is:

print "time", "Cpu%usr", "Cpu%sys", "Cpu%io", "Cpu%idl", "Mem%usd", "%Cache+Buff", "Swp%usd", "pswpin/s", "pswpout/s", "runq-sz", "ldavg-5", "IRQ/s", "proc/s", "cswch/s", "  tps", " rtps", " wtps"
if ( $1 == "===========" ) { round+=1 }
#time formating, change from AM/PM to 24hour format and remove seconds
if ( $1 != "===========" ) {
	if ( $2 != "AM" && $2 != "PM" ) {print "\n\nsecond column does not contain AM / PM time marker, quiting"; exit }
        #time formating, change from AM/PM to 24hour format and remove seconds
        if ( $2 == "AM" && substr($1,1,2) == 12 ) {fulltime="00:" substr($1,4,2)}
                else if ( $2 == "PM" && substr($1,1,2) == 12 ) {fulltime="12:" substr($1,4,2)}
                        else if ( $2 == "PM" ) {fulltime=substr($1,1,2)+12 ":" substr($1,4,2)}
                                else {fulltime=substr($1,1,5)}
#cpu usage; sar -u
if ( round == 0 ) {
#mem/swap usage; sar -r
if ( round == 1 ) {
#	print "buff + cache="($6+$7)" total mem="($3+$4) " "($6+$7)*100/($3+$4) " mem used=" $5
#load average for 1 minute; sar -q
if ( round == 2 ) {
#IRQ/s ; sar -I SUM
if ( round == 3 ) {
# proc/s - Total number of processes created per second; sar -c
if ( round == 4 ) {
# Total number of context switches per second ; sar -w
if ( round == 5 ) {
#swap activity; sar -W
if ( round == 6 ) {
#I/O transfers to disks; sar -b
if ( round == 7 ) {
for (i = 1; i <= count; i++) {
printf ("%s %6g  %6g %6g  %6g  %6g       %2.2f  %6g    %5d     %5d     %3g  %6g %5d %6d %7d %5d %5d %5d\n",time, cpu_user[time], cpu_system[time], cpu_iowait[time], cpu_idle[time], mem_used[time], mem_cache_and_buffers[time], swap_used[time], pswpin_s[time], pswpout_s[time], runq_sz[time], ldavg_5[time], intr_s[time], proc_s[time], cswch_s[time], tps[time], rtps[time], wtps[time])

This being taken care of the next problem was i wanted to also have some graphs for the data as is might help even more to understand a particular situation and it also is something useful to show to the management. There is ksar which is very useful but doesn't have the option to show on the same graph or on the same page custom selected fields from sar's performance data so i used Gnuplot in order to get something useful (though it doesn't look as good as rrdtool's graphs) . If i will have the time i may do the same graphs using rrdtool as they look a lot better.

The code for gnuplot (file name report.plot) is:

unset output
set terminal png
set output "output.png"
set term png size 1200, 800
set origin 0,0
set multiplot
set size 1,0.4
set origin 0,0.6
set timefmt "%H:%M"
set xdata time
set format x "%H:%M"
set yrange [0:100]
plot 'input' using 1:($2+$3+$4) with filledcurve x1 lc rgb "grey" title 'cpu usage %', '' using 1:2 with lines title 'Cpu usage %userland', '' using 1:3 with lines title 'Cpu usage %sys', '' using 1:4 with lines title 'Cpu usage %I/O wait', '' using 1:6 with linespoints pt 1 title 'Memory %used', '' using 1:7 with linespoints pt 2 title 'Mem %buffers+disk cache', '' using 1:8 with linespoints pt 3 title 'Swap %used'
set yrange [0:*]
set size 1,0.20
set origin 0,0.40
plot 'input' using 1:11 with lines title 'Run-q size', '' using 1:12 with lines title 'Load Avg-5', '' using 1:14 with lines title 'Proc created/sec'
set yrange [0:*]
set size 1,0.20
set origin 0,0.20
plot 'input' using 1:9 with lines title 'Pages swapin/s', '' using 1:10 with lines title 'Pages swapout/s'
set yrange [0:*]
set size 1,0.20
set origin 0,0
plot 'input' using 1:16 with lines title 'tps', '' using 1:17 with lines title 'rtps', '' using 1:18 with lines title 'wtps'
unset multiplot
unset output

To be noted is that the output file is output.png and the expects that the input file is named input and located in the current directory.

(for i in -u -r -q "-I SUM" -c -w -W -b; do LC_TIME="en_US.UTF-8" sar -s 10:00:00 -e 18:00:01 -f /var/log/sysstat/sa18 $i| tail -n +4 | head -n -1; echo "==========="; sleep 1; done) | awk -f report.awk | tail -n +2 > input

In case you are doing a full 24h sar report (using -f and not specifying -s and -e) then you need to remove the last line of the report (00:00 bla bla bla) as it will mess up the gnuplot graph . This can be done by adding another pipe at the end of the above command with head -n -1 , before the output redirection .
Another thing to note is the format of sar's time report which depends on the LC_TIME environment variable (and it's overridden by LC_ALL environment variable) . As long as you set that as above (LC_TIME="en_US.UTF-8") during the script run, it should be OK as the above awk script is designed for the AM/PM format.The awk script was tested on RHEL/Centos 5.5, Ubuntu 8.04 and Debian Lenny.

Finally run gnuplot report.plot in order to have a graph named output.png created in the current directory

  • anomie

    Not a single comment on this article? Muchas gracias for putting this together. Especially the gnuplot example code you provided is very helpful.

  • wak

    Wow.. This is useful. THanks. Will try this out soon.

  • Ted

    Excelent article!

  • rpw

    Nice article. I’m using Oracle Linux 6.3 which is an RHEL derivative, and found that if I export LC_ALL=”C” I get the 24 hour time, and don’t need the code to convert the time in the awk script. Have to adjust all the using parameters in the plot script as you don’t have an AM/PM column anymore.

    I export the LC_ALL=”C” inside the parenthesis so that it doesn’t affect the awk script spawned after the collection.

    I tried to just export LC_TIME=”C” but that didn’t work.

  • mitzone

    Excellent! Thanks for sharing!

  • U.Rau

    Great stuff. Thanks alot for sharing!

  • hannes

    Yeah, this is awesome!!!! thanks a lot, will rewrite my scripts to use multiplot!!!